Overview

Dataset statistics

Number of variables11
Number of observations149999
Missing cells3924
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory12.6 MiB
Average record size in memory88.0 B

Variable types

NUM10
BOOL1

Reproduction

Analysis started2020-08-11 03:49:30.890182
Analysis finished2020-08-11 03:50:15.055461
Duration44.17 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Unnamed: 0 is highly correlated with df_indexHigh correlation
df_index is highly correlated with Unnamed: 0High correlation
NumberOfDependents has 3924 (2.6%) missing values Missing
RevolvingUtilizationOfUnsecuredLines is highly skewed (γ1 = 97.63124905) Skewed
NumberOfTime30-59DaysPastDueNotWorse is highly skewed (γ1 = 22.59703929) Skewed
DebtRatio is highly skewed (γ1 = 95.15750074) Skewed
MonthlyIncome is highly skewed (γ1 = 29.41205776) Skewed
df_index has unique values Unique
Unnamed: 0 has unique values Unique
RevolvingUtilizationOfUnsecuredLines has 10878 (7.3%) zeros Zeros
NumberOfTime30-59DaysPastDueNotWorse has 126018 (84.0%) zeros Zeros
DebtRatio has 4113 (2.7%) zeros Zeros
MonthlyIncome has 1634 (1.1%) zeros Zeros
NumberOfOpenCreditLinesAndLoans has 1888 (1.3%) zeros Zeros
NumberRealEstateLoansOrLines has 56188 (37.5%) zeros Zeros
NumberOfDependents has 86902 (57.9%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct count149999
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74999.56203041354
Minimum0
Maximum149999
Zeros1
Zeros (%)< 0.1%
Memory size1.1 MiB
2020-08-11T11:50:15.194423image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7499.9
Q137499.5
median75000
Q3112499.5
95-th percentile142499.1
Maximum149999
Range149999
Interquartile range (IQR)75000

Descriptive statistics

Standard deviation43301.5522
Coefficient of variation (CV)0.5773574009
Kurtosis-1.200010906
Mean74999.56203
Median Absolute Deviation (MAD)37500
Skewness-4.231478641e-06
Sum1.124985930e+10
Variance1875024423
2020-08-11T11:50:15.381024image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
1139491< 0.1%
 
156611< 0.1%
 
136121< 0.1%
 
33711< 0.1%
 
13221< 0.1%
 
74651< 0.1%
 
54161< 0.1%
 
279431< 0.1%
 
258941< 0.1%
 
Other values (149989)149989> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
1499991< 0.1%
 
1499981< 0.1%
 
1499971< 0.1%
 
1499961< 0.1%
 
1499951< 0.1%
 

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct count149999
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75000.56203041354
Minimum1
Maximum150000
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB
2020-08-11T11:50:15.649086image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7500.9
Q137500.5
median75001
Q3112500.5
95-th percentile142500.1
Maximum150000
Range149999
Interquartile range (IQR)75000

Descriptive statistics

Standard deviation43301.5522
Coefficient of variation (CV)0.5773497028
Kurtosis-1.200010906
Mean75000.56203
Median Absolute Deviation (MAD)37500
Skewness-4.231478641e-06
Sum1.12500093e+10
Variance1875024423
2020-08-11T11:50:15.834855image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
1098551< 0.1%
 
115671< 0.1%
 
95181< 0.1%
 
156611< 0.1%
 
136121< 0.1%
 
33711< 0.1%
 
13221< 0.1%
 
74651< 0.1%
 
54161< 0.1%
 
Other values (149989)149989> 99.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
1500001< 0.1%
 
1499991< 0.1%
 
1499981< 0.1%
 
1499971< 0.1%
 
1499961< 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
139973
1
 
10026
ValueCountFrequency (%) 
013997393.3%
 
1100266.7%
 

RevolvingUtilizationOfUnsecuredLines
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count125728
Unique (%)83.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.048471711145627
Minimum0.0
Maximum50708.0
Zeros10878
Zeros (%)7.3%
Memory size1.1 MiB
2020-08-11T11:50:16.259400image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.029866918
median0.154175793
Q30.5590437525
95-th percentile0.9999999
Maximum50708
Range50708
Interquartile range (IQR)0.5291768345

Descriptive statistics

Standard deviation249.7562028
Coefficient of variation (CV)41.29244787
Kurtosis14544.61645
Mean6.048471711
Median Absolute Deviation (MAD)0.148322474
Skewness97.63124905
Sum907264.7082
Variance62378.16084
2020-08-11T11:50:16.538796image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0108787.3%
 
0.9999999102556.8%
 
117< 0.1%
 
0.95009988< 0.1%
 
0.713147416< 0.1%
 
0.0079840326< 0.1%
 
0.9540918166< 0.1%
 
0.7964071865< 0.1%
 
0.8502994015< 0.1%
 
0.5389221565< 0.1%
 
Other values (125718)12880885.9%
 
ValueCountFrequency (%) 
0108787.3%
 
8.37e-061< 0.1%
 
9.93e-061< 0.1%
 
1.25e-051< 0.1%
 
1.43e-051< 0.1%
 
ValueCountFrequency (%) 
507081< 0.1%
 
291101< 0.1%
 
221981< 0.1%
 
220001< 0.1%
 
205141< 0.1%
 

age
Real number (ℝ≥0)

Distinct count85
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.295555303702024
Minimum21
Maximum109
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB
2020-08-11T11:50:16.768999image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile29
Q141
median52
Q363
95-th percentile78
Maximum109
Range88
Interquartile range (IQR)22

Descriptive statistics

Standard deviation14.77129796
Coefficient of variation (CV)0.2824580001
Kurtosis-0.4953320655
Mean52.2955553
Median Absolute Deviation (MAD)11
Skewness0.1892426318
Sum7844281
Variance218.1912435
2020-08-11T11:50:16.938298image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4938372.6%
 
4838062.5%
 
5037532.5%
 
4737192.5%
 
6337192.5%
 
4637142.5%
 
5336482.4%
 
5136272.4%
 
5236092.4%
 
5635892.4%
 
Other values (75)11297875.3%
 
ValueCountFrequency (%) 
211830.1%
 
224340.3%
 
236410.4%
 
248160.5%
 
259530.6%
 
ValueCountFrequency (%) 
1092< 0.1%
 
1071< 0.1%
 
1051< 0.1%
 
1033< 0.1%
 
1023< 0.1%
 

NumberOfTime30-59DaysPastDueNotWorse
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count16
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4210294735298235
Minimum0
Maximum98
Zeros126018
Zeros (%)84.0%
Memory size1.1 MiB
2020-08-11T11:50:17.123360image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.192794982
Coefficient of variation (CV)9.958435799
Kurtosis522.3732593
Mean0.4210294735
Median Absolute Deviation (MAD)0
Skewness22.59703929
Sum63154
Variance17.57952976
2020-08-11T11:50:17.261121image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
012601884.0%
 
11603210.7%
 
245983.1%
 
317541.2%
 
47470.5%
 
53420.2%
 
982640.2%
 
61400.1%
 
754< 0.1%
 
825< 0.1%
 
Other values (6)25< 0.1%
 
ValueCountFrequency (%) 
012601884.0%
 
11603210.7%
 
245983.1%
 
317541.2%
 
47470.5%
 
ValueCountFrequency (%) 
982640.2%
 
965< 0.1%
 
131< 0.1%
 
122< 0.1%
 
111< 0.1%
 

DebtRatio
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count114193
Unique (%)76.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean353.0074262338636
Minimum0.0
Maximum329664.0
Zeros4113
Zeros (%)2.7%
Memory size1.1 MiB
2020-08-11T11:50:17.500619image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.004329004
Q10.1750736325
median0.366503221
Q30.8682570065
95-th percentile2449
Maximum329664
Range329664
Interquartile range (IQR)0.693183374

Descriptive statistics

Standard deviation2037.825113
Coefficient of variation (CV)5.772754229
Kurtosis13734.20232
Mean353.0074262
Median Absolute Deviation (MAD)0.245720104
Skewness95.15750074
Sum52950760.93
Variance4152731.19
2020-08-11T11:50:17.648945image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
041132.7%
 
12290.2%
 
41740.1%
 
21700.1%
 
31620.1%
 
51430.1%
 
91250.1%
 
101170.1%
 
71150.1%
 
131140.1%
 
Other values (114183)14453796.4%
 
ValueCountFrequency (%) 
041132.7%
 
2.6e-051< 0.1%
 
3.69e-051< 0.1%
 
3.93e-051< 0.1%
 
6.62e-051< 0.1%
 
ValueCountFrequency (%) 
3296641< 0.1%
 
3264421< 0.1%
 
3070011< 0.1%
 
2205161< 0.1%
 
1688351< 0.1%
 

MonthlyIncome
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count13584
Unique (%)9.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5587.847976319842
Minimum0.0
Maximum500000.0
Zeros1634
Zeros (%)1.1%
Memory size1.1 MiB
2020-08-11T11:50:17.877069image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile820
Q11800
median4357
Q37400
95-th percentile13500
Maximum500000
Range500000
Interquartile range (IQR)5600

Descriptive statistics

Standard deviation7778.687841
Coefficient of variation (CV)1.392072203
Kurtosis1608.138244
Mean5587.847976
Median Absolute Deviation (MAD)2557
Skewness29.41205776
Sum838171608.6
Variance60507984.52
2020-08-11T11:50:18.014014image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1800127488.5%
 
1328.6123468.2%
 
82052603.5%
 
500027571.8%
 
400021061.4%
 
600019331.3%
 
300017581.2%
 
016341.1%
 
250015511.0%
 
1000014661.0%
 
Other values (13574)10644071.0%
 
ValueCountFrequency (%) 
016341.1%
 
16050.4%
 
26< 0.1%
 
42< 0.1%
 
52< 0.1%
 
ValueCountFrequency (%) 
50000012< 0.1%
 
4400001< 0.1%
 
4282501< 0.1%
 
4083331< 0.1%
 
3240001< 0.1%
 

NumberOfOpenCreditLinesAndLoans
Real number (ℝ≥0)

ZEROS

Distinct count58
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.452776351842346
Minimum0
Maximum58
Zeros1888
Zeros (%)1.3%
Memory size1.1 MiB
2020-08-11T11:50:18.174098image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median8
Q311
95-th percentile18
Maximum58
Range58
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.145964246
Coefficient of variation (CV)0.6087898262
Kurtosis3.091028799
Mean8.452776352
Median Absolute Deviation (MAD)3
Skewness1.215303679
Sum1267908
Variance26.48094802
2020-08-11T11:50:18.336294image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6136139.1%
 
7132458.8%
 
5129318.6%
 
8125628.4%
 
4116097.7%
 
9113557.6%
 
1096246.4%
 
390586.0%
 
1183215.5%
 
1270054.7%
 
Other values (48)4067627.1%
 
ValueCountFrequency (%) 
018881.3%
 
144383.0%
 
266664.4%
 
390586.0%
 
4116097.7%
 
ValueCountFrequency (%) 
581< 0.1%
 
572< 0.1%
 
562< 0.1%
 
544< 0.1%
 
531< 0.1%
 

NumberRealEstateLoansOrLines
Real number (ℝ≥0)

ZEROS

Distinct count28
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0182334548896992
Minimum0
Maximum54
Zeros56188
Zeros (%)37.5%
Memory size1.1 MiB
2020-08-11T11:50:18.579448image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum54
Range54
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.129771907
Coefficient of variation (CV)1.109541139
Kurtosis60.47710052
Mean1.018233455
Median Absolute Deviation (MAD)1
Skewness3.482511689
Sum152734
Variance1.276384562
2020-08-11T11:50:18.794621image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
05618837.5%
 
15233834.9%
 
23152121.0%
 
363004.2%
 
421701.4%
 
56890.5%
 
63200.2%
 
71710.1%
 
8930.1%
 
9780.1%
 
Other values (18)1310.1%
 
ValueCountFrequency (%) 
05618837.5%
 
15233834.9%
 
23152121.0%
 
363004.2%
 
421701.4%
 
ValueCountFrequency (%) 
541< 0.1%
 
321< 0.1%
 
291< 0.1%
 
261< 0.1%
 
253< 0.1%
 

NumberOfDependents
Real number (ℝ≥0)

MISSING
ZEROS

Distinct count9
Unique (%)< 0.1%
Missing3924
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean0.7569946945062468
Minimum0.0
Maximum8.0
Zeros86902
Zeros (%)57.9%
Memory size1.1 MiB
2020-08-11T11:50:19.003521image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.113064882
Coefficient of variation (CV)1.470373426
Kurtosis2.216483236
Mean0.7569946945
Median Absolute Deviation (MAD)0
Skewness1.542175099
Sum110578
Variance1.238913432
2020-08-11T11:50:19.329134image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
08690257.9%
 
12631617.5%
 
21952113.0%
 
394836.3%
 
428621.9%
 
57460.5%
 
61580.1%
 
751< 0.1%
 
836< 0.1%
 
(Missing)39242.6%
 
ValueCountFrequency (%) 
08690257.9%
 
12631617.5%
 
21952113.0%
 
394836.3%
 
428621.9%
 
ValueCountFrequency (%) 
836< 0.1%
 
751< 0.1%
 
61580.1%
 
57460.5%
 
428621.9%
 

Interactions

2020-08-11T11:49:46.064513image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:46.376094image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:46.651877image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:46.928418image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:47.179811image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:47.464836image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:47.727788image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:47.993128image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:48.276821image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:48.543300image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:48.831961image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:49.105474image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:49.380275image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:49.645580image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:49.899065image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:50.224098image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:50.492311image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:51.119936image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:51.425983image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:51.673478image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:51.966606image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:52.408027image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:52.675598image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:52.925035image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:53.203341image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:53.662905image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:53.917501image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:54.178600image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:54.549368image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:54.859895image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:55.145144image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:55.601049image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:55.843540image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:56.076536image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:56.306037image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:56.561731image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:56.781128image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:56.992262image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:57.226474image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:57.485135image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:57.752634image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:58.012786image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:58.269859image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:58.520468image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:58.735272image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:59.084676image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:59.317666image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:59.604155image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:49:59.894072image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:00.155380image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:00.438393image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:00.704914image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:00.952372image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:01.186834image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:01.424834image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:01.646997image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:01.869597image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:02.097257image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:02.320604image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:02.542574image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:02.778996image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:03.009538image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:03.240303image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:03.466922image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:03.673780image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:03.919462image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:04.140797image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:04.367122image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:04.590124image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:04.806099image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:05.072185image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:05.331867image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:05.622949image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:06.023813image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:06.319687image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:06.571774image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:06.802576image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:07.055044image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:07.321119image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:07.585683image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:07.895424image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:08.218353image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:08.484402image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:08.730816image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:08.945914image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:09.182312image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:09.427917image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:09.678127image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:09.924917image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:10.148926image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:10.380533image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:10.654451image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:10.893641image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:11.142273image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:11.373576image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:11.617645image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:11.904349image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:12.165090image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:12.428391image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:12.657702image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Correlations

2020-08-11T11:50:19.648906image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-11T11:50:20.177363image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-11T11:50:20.564361image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-11T11:50:21.162510image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-11T11:50:13.220349image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:13.926729image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T11:50:14.654124image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Sample

First rows

df_indexUnnamed: 0SeriousDlqin2yrsRevolvingUtilizationOfUnsecuredLinesageNumberOfTime30-59DaysPastDueNotWorseDebtRatioMonthlyIncomeNumberOfOpenCreditLinesAndLoansNumberRealEstateLoansOrLinesNumberOfDependents
00110.7661274520.8029829120.01362.0
11200.9571514000.1218762600.0401.0
22300.6581803810.0851133042.0200.0
33400.2338103000.0360503300.0500.0
44500.9072394910.02492663588.0710.0
55600.2131797400.3756073500.0311.0
66700.3056825705710.0000001800.0830.0
77800.7544643900.2099403500.0800.0
88900.11695127046.000000820.020NaN
991000.1891695700.60629123684.0942.0

Last rows

df_indexUnnamed: 0SeriousDlqin2yrsRevolvingUtilizationOfUnsecuredLinesageNumberOfTime30-59DaysPastDueNotWorseDebtRatioMonthlyIncomeNumberOfOpenCreditLinesAndLoansNumberRealEstateLoansOrLinesNumberOfDependents
14998914999014999100.0555184600.6097794335.0712.0
14999014999114999200.1041125900.47765810316.01020.0
14999114999214999300.8719765004132.0000001800.01113.0
14999214999314999401.0000002200.000000820.0100.0
14999314999414999500.3857425000.4042933400.0700.0
14999414999514999600.0406747400.2251312100.0410.0
14999514999614999700.2997454400.7165625584.0412.0
14999614999714999800.2460445803870.0000001800.01810.0
14999714999814999900.0000003000.0000005716.0400.0
14999814999915000000.8502836400.2499088158.0820.0